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ABSTRACT 

The  key  objective  of  this  project  is  the  use  of  visual  information  in  a  feedback  loop,  the  underlying  problem  of  controlled  active  vision.  The 
problem  of  controlled  active  vision,  and  in  particular  visual  tracking  requires  the  integration  of  techniques  from  control  theory,  signal 
processing,  and  computer  vision.  For  some  time  now  the  role  of  control  theory  in  vision  has  been  recognized.  In  particular,  the  branches  of 
control  that  deal  with  system  uncertainty,  namely  adaptive  and  robust,  have  been  proposed  as  essential  tools  in  coming  to  grips  with  the 
problems  of  both  machine  and  biological  vision. 

Visual  tracking  provides  a  fundamental  example  of  the  need  for  controlled  active  vision.  While  tracking  in  the  presence  of  a  disturbance  is  a 
classical  control  problem,  visual  tracking  raises  new  issues.  First  since  cameras  are  part  of  the  system,  one  must  consider  the  nature  of  the 
disturbance  from  imaging  sensors.  The  feedback  signal  may  require  some  interpretation  of  the  image,  for  example  segmentation  of  a  target 
from  its  background,  or  an  inference  about  an  occluder.  In  the  project,  we  expressly  emphasize  active  vision,  because  the  result  may  be 
viewpoint  dependent.  In  particular,  calibration  may  influence  the  control  law.  And  finally,  as  visual  processing  becomes  more  complex,  the 
issue  of  processing  time  arises.  Each  of  these  problems  must  be  answered  before  target  detection,  and  visually-mediated  control  can  be 
provided  for  advanced  weapon  systems. 
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1  Statement  of  Problem 

The  key  objective  of  this  project  was  the  development  of  new  methodologies  for  employing  visual 
information  in  a  feedback  loop,  the  underlying  problem  of  controlled  active  vision.  Controlled 
active  vision,  and  in  particular  visual  tracking  requires  the  integration  of  techniques  from  control 
theory,  signal  processing,  and  computer  vision.  For  some  time  now  the  role  of  control  theory  in 
vision  has  been  recognized.  In  particular,  the  branches  of  control  that  deal  with  system  uncertainty, 
namely  adaptive  and  robust,  have  been  proposed  as  essential  tools  in  coming  to  grips  with  the 
problems  of  both  machine  and  biological  vision. 

Visual  tracking  provides  a  fundamental  example  of  the  need  for  controlled  active  vision.  While 
tracking  in  the  presence  of  a  disturbance  is  a  classical  control  problem,  visual  tracking  raises  new 
issues.  First  since  cameras  are  part  of  the  system,  one  must  consider  the  nature  of  the  disturbance 
from  imaging  sensors.  The  feedback  signal  may  require  some  interpretation  of  the  image,  e.g., 
segmentation  of  a  target  from  its  background,  or  an  inference  about  an  occluder.  In  this  project,  we 
expressly  emphasized  active  vision,  because  the  result  may  be  viewpoint  dependent.  In  particular, 
calibration  may  influence  the  control  law.  Finally,  as  visual  processing  becomes  more  complex,  the 
issue  of  processing  time  arises.  Each  of  these  problems  must  be  answered  before  target  detection, 
and  visually-mediated  control  can  be  provided  for  advanced  weapon  systems. 

There  are  a  number  of  technical  problems  connected  with  using  vision  in  a  closed  loop  setting. 
In  general,  many  of  the  mathematical  formulations  of  the  key  issues  of  image  understanding  may 
be  ill-posed.  Many  basic  vision  tasks  such  as  segmentation  and  a  robust  theory  of  shape  remain 
largely  unsolved.  For  visual  tracking,  real-time  processing  becomes  a  major  concern.  Despite  these 
formidable  obstacles,  progress  is  being  made  especially  in  integrating  control  ideas  into  the  vision 
framework.  Control  is  very  powerful  in  treating  uncertainty,  and  with  the  newer  partial  differential 
equation  methodologies,  there  is  a  strong  mathematical  and  systems-theoretic  fit.  In  fact  many  of 
the  recent  curvature  flows  may  be  derived  from  principles  of  optimal  control.  Until  very  recently, 
much  of  the  control  work  in  vision  was  performed  by  researchers  in  computer  vision  usually 
with  a  computer  science  background.  Now,  however,  researchers  with  a  strong  control/systems 
background  are  getting  more  actively  involved  given  the  opportunities  for  modern  control  ideas  in 
vision.  We  believe  that  this  welcome  development  will  significantly  help  in  alleviating  many  of  the 
formidable  problems  remaining  in  using  visual  information  in  a  feedback  loop,  and  having  more 
reliable  robust  systems  in  which  the  primary  sensor  is  based  largely  on  some  imaging  device. 

The  prevalence  of  biological  vision  in  even  very  simple  organisms,  indicates  its  inherent  util¬ 
ity  for  man-made  devices.  Cameras  are  rather  simple,  reliable  passive  sensing  devices  which  are 
quite  inexpensive  per  bit  of  data.  Furthermore  vision  can  offer  information  at  a  high  rate  with 
high  resolution  with  a  wide  field  of  view  and  accuracy  capturing  multispectral  information.  Fi¬ 
nally  cameras  can  be  used  in  a  more  active  manner.  Namely,  one  can  include  motorized  lenses 
and  mounted  on  mobile  platforms  which  can  actively  explore  the  surrounds  and  suitably  adapt 
their  sensing  capabilities.  Computer  vision  has  formulated  several  approaches  for  interpreting  the 
signals,  opening  the  possibility  that  control  laws  can  be  based  on  more  abstracted  descriptions. 
These  problems  all  become  manifest  when  one  attempts  to  use  a  visual  sensor  in  an  uncertain 
environment,  and  to  feed  back  in  some  manner  the  information.  These  issues  represent  some  of 
the  main  challenges  for  our  research  in  controlled  active  vision. 

A  major  goal  of  our  program  is  to  ensure  that  our  research  is  directly  driven  by  applications 
and  that  our  research  results  have  a  direct  impact  on  the  military  and  industrial  base.  Thus  we 
need  to  ensure  that  all  of  the  mathematically  based  results  are  amenable  to  robust  and  reliable 
computer  implementations. 

In  this  research  program,  we  continued  the  use  of  curvature-driven  partial  differential  equations 
to  explore  problems  in  visual  control  and  controlled  active  vision,  but  we  have  also  considered  the 
sensor/estimation  problem  much  more  carefully.  This  has  led  to  a  new  approach  for  tracking  in 
which  particle  filtering  was  combined  with  level  set  ideas  for  the  first  time. 
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2  Summary  of  Key  Results 

2.1  Brief  Review  of  Geometric  Flows  in  Vision  and  Image  Processing 

Curvature  driven  flows  based  on  the  the  minimization  of  certain  geometric  functionals  have  been 
used  for  a  number  of  problems  in  visual  control  and  computer  vision  in  the  past  few  years  [103, 
88,  109].  These  flows  themselves  are  very  much  motivated  by  ideas  in  optimal  control;  see  [69]. 
We  will  summarize  some  of  the  key  flows  below  which  we  will  need  in  the  sequel. 


2.1.1  Algorithms  and  PDEs 

We  briefly  review  the  major  concepts  involved  in  using  partial  differential  equations  (PDEs)  for 
image  processing  and  computer  vision. 

As  explained  in  detail  in  [20],  one  can  think  of  an  image  as  a  map  /  :  S  — >  £,  i.e.,  to  any  point 
X  in  the  domain  S,  /  associates  a  “color”  I{x)  in  a  color  space  £.  For  ease  of  presentation  we  will 
mainly  restrict  ourselves  to  the  case  of  a  two-dimensional  gray  scale  image  which  we  can  think  of 
as  a  function  from  a  domain  S  =  [0, 1]  x  [0, 1]  C  to  the  unit  interval  £  =  [0, 1]. 

The  algorithms  all  involve  solving  the  initial  value  problem  for  some  PDE  for  a  given  amount 
of  time.  The  solution  to  this  PDE  can  be  either  the  image  itself  at  different  stages  of  modification, 
or  some  other  object  (such  as  a  closed  curve  delineating  object  boundaries)  whose  evolution  is 
driven  by  the  image. 

For  example,  introducing  an  artificial  time  t,  the  image  can  be  deformed  according  to 


dl 

m 


m, 


(1) 


where  I{x,t)  :  S  x  [0,  T)  — >  £  is  the  evolving  image,  T  is  an  operator  which  characterizes  the  given 
algorithm,  and  the  initial  condition  is  the  input  image  /q.  The  processed  image  is  the  solution 
I{x,t)  of  the  differential  equation  at  time  t.  The  operator  T  usually  is  a  differential  operator, 
although  its  dependence  on  /  may  also  be  nonlocal. 

Similarly,  one  can  evolve  a  closed  curve  C  C  S  representing  the  boundaries  of  some  planar 
shape  (C  need  not  be  connected  and  could  have  several  components).  In  this  case,  the  operator 
iF  specifies  the  normal  velocity  of  the  curve  that  it  deforms.  In  many  cases  this  normal  velocity  is 
a  function  of  the  curvature  k  of  C,  and  of  the  image  I  evaluated  on  C.  A  flow  of  the  form 

dC 

-=T{I,n)N  (2) 

is  obtained,  where  M  is  the  inward  unit  normal  to  the  curve  C. 

Very  often,  the  deformation  is  obtained  as  the  steepest  descent  for  some  energy  functional.  For 
example,  the  energy 

m  =  lj  l|V/||2  dxdy  (3) 

and  its  associated  steepest  descent,  the  heat  equation, 


dl 

m 


=  AI 


(4) 


correspond  to  the  classical  Gaussian  smoothing. 

The  use  of  PDEs  allows  for  the  modelling  of  the  crucial  but  poorly  understood  interactions 
between  top-down  and  bottom-up  vision.  For  example,  in  a  variational  framework,  an  energy  £  is 
defined  globally  while  the  corresponding  operator  T  will  influence  the  image  locally.  Algorithms 
defined  in  terms  of  PDEs  treat  images  as  continuous  rather  than  discrete  objects.  This  has  a 
simplifying  effect  on  the  formalism,  which  becomes  grid  independent.  On  the  other  hand  models 
based  on  nonlinear  PDEs  may  be  much  harder  to  rigorously  analyze  and  implement. 
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2.1.2  Curve  Evolution  Using  Level  Sets 

Geometric  active  contours  evolving  according  to  an  edge  based  and/or  region  based  energy  flow 
are  very  commonly  used  for  image  segmentation.  In  these  methods,  starting  from  an  initial  esti¬ 
mate,  the  curve  deforms  under  the  influence  of  various  forces  until  it  fits  the  object  boundaries. 
The  curve  evolution  equation  is  obtained  by  reducing  an  energy  Eimage  as  fast  as  possible,  i.e.,  by 
doing  a  gradient  descent  on  Eimage-  In  general,  Eimage  niay  depend  on  a  combination  of  image 
based  features  and  external  constraints  (smoothness,  shape,  etc.);  see  [82],  [21]  and  the  references 
therein.  The  level  set  methods  of  Osher  and  Sethian  [87]  offer  a  natural  and  numerically  reliable 
implementation  of  such  curve  evolution  equations.  Level  sets  have  the  advantage  of  being  param¬ 
eter  independent  (i.e.,  they  are  implicit  representation  of  the  curve)  and  can  handle  topological 
changes  in  a  very  natural  way. 

We  now  briefly  go  over  the  level  set  representation  of  a  given  curve  evolution  equation.  Let 
C{p,t)  :  [0, 1]  X  [0,r)  ^  be  a  family  of  closed  curves  (i.e.,  C(0,t)  =  C(l,t)  for  all  t  G  [0,r)), 
satisfying  the  following  evolution  equation: 


Ik 


VM 


(5) 


where,  t  is  the  time  parameter.  The  basic  idea  of  the  level  set  approach  is  to  embed  the  contour 
C{p,t)  as  the  zero  level  set  of  a  smooth  and  Lipschitz  continuous  function  <!>  :  x  [0,T)  ^  R. 
Assume  that  <i>  is  negative  in  the  interior  and  positive  in  the  exterior  of  the  zero  level  set.  We 
consider  the  zero  level  set,  defined  by 

{Z(t)  G  R2  :  $(Z,t)  =  0}.  (6) 


We  have  to  And  an  evolution  equation  of  <i>,  such  that  the  evolving  curve  Ct  is  given  by  the  evolving 
zero  level  Z{t).  By  differentiating  (6)  with  respect  to  t  we  obtain: 


V$(Z,t) 


dZ 

Ik 


(7) 


Note  that  for  the  zero  level  set,  the  following  relation  holds: 


II  I 


-Af. 


(8) 


In  this  equation,  the  left  side  uses  terms  of  the  surface  d),  while  the  right  side  is  related  to  the 
curve  C.  The  combination  of  equations  (5)  to  (8)  gives 

^  =  I"  li  V<I>  II ,  (9) 

and  the  curve  C,  evolving  according  to  (5),  is  obtained  by  the  zero  level  set  of  the  function  <I>, 
which  evolves  according  to  (9).  This  is  a  Hamilton- Jacobi  equation  which  can  be  analyzed  using 
viscosity  theory  [27].  Finally,  given  an  initial  curve,  one  must  generate  an  initial  level  set  function. 
A  well  known  scheme  [87]  is  to  use  a  signed  distance  function. 


2.2  Active  Vision  and  Visual  Tracking 

The  use  of  active  contour  methods  was  essential  in  our  research  program  in  controlled  active  vision. 
We  now  go  over  some  of  the  key  results  pertaining  to  this  methodology. 
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2.2.1  Snakes 


The  concept  of  snakes  (also  called  deformable  or  active  contours)  was  introduced  by  Witkin, 
Kass  and  Terzopoulos  [65],  and  later  developed  by  a  number  of  researchers  (see  [26]  and  the 
references  therein).  They  may  be  used  for  edge  detection  in  the  following  manner.  Given  an 
image  /  :  S)  C  — >  £,  one  subjects  an  initial  simple  closed  parameterized  curve  C  :  [0, 1]  ^  D  to 
a  steepest  descent  flow  for  an  energy  functional  of  the  form: 

m  =  { 2^i(p)ii4pf  +  + w{c{p))^  dp.  (10) 


(Here  the  p  subscript  indicates  differentiation  with  respect  to  p.)  The  first  two  terms  control  the 
smoothness  of  the  active  contour  C.  The  contour  interacts  with  the  image  through  the  potential 
function  IT  :  2)  — >  R  which  is  chosen  to  be  small  near  the  edges,  and  large  everywhere  else  (it  is 
a  decreasing  function  of  some  edge  detector).  For  example,  one  could  take: 


l  +  ||VG.*/(a:)||2’ 


(11) 


where  denotes  a  Gaussian  Alter  of  standard  deviation  a. 

Minimizing  £  will  therefore  attract  C  toward  the  edges.  The  gradient  flow  is  the  fourth  order 
nonlinear  parabolic  equation 

dC 

=  -  {w2{p)Cpp)^^  +  {wi{p)Cp)^  +  VW{C{p,t)).  (12) 

This  approach  gives  reasonable  results;  see  [78]  for  a  survey  of  snakes  in  medical  image  analysis. 
One  limitation  however  is  that  the  active  contour  or  snake  cannot  change  topology,  i.e.,  it  starts 
out  being  a  topological  circle  and  it  will  always  remain  a  topological  circle  and  will  not  be  able 
to  break  up  into  two  or  more  pieces,  even  if  the  image  would  contain  two  unconnected  objects 
and  this  would  give  a  more  natural  description  of  the  edges.  Special  ad  hoc  procedures  have  been 
developed  in  order  to  handle  splitting  and  merging  [79]. 


2.2.2  Geometric  Active  Contours 

Another  disadvantage  of  the  snake  method  is  that  it  explicitly  involves  the  parametrization  of  the 
active  contour  C,  while  there  is  no  obvious  relation  between  the  parametrization  of  the  contour 
and  the  geometry  of  the  objects  to  be  captured.  Geometric  models  have  been  developed  in  [18]  to 
address  this  issue. 

As  in  the  snake  framework,  one  deforms  the  active  contour  C  by  a  velocity  which  is  essentially 
defined  by  a  curvature  term,  and  a  constant  inflationary  term  weighted  by  a  stopping  function 
W .  By  formulating  everything  in  terms  of  quantities  which  are  invariant  under  reparametrization 
(such  as  the  curvature  and  normal  velocity  of  C)  one  obtains  an  algorithm  which  does  not  depend 
on  the  parametrization  of  the  contour.  In  particular,  it  can  be  implemented  using  level  sets. 
More  specifically,  the  model  of  [18]  is  given  by 

V  =  W{x){k  +  c),  (13) 

where  both  the  velocity  V  and  the  curvature  k.  are  measured  using  the  inward  normal  Af  for  C. 
Here,  as  previously,  IT  is  small  at  edges  and  large  everywhere  else,  and  c  is  a  constant,  called  the 
inflationary  parameter.  When  c  is  positive,  it  helps  push  the  contour  through  concavities,  and 
can  speed  up  the  segmentation  process.  When  it  is  negative,  it  allows  expanding  “bubbles,”  i.e., 
contours  which  expand  rather  than  contract  to  the  desired  boundaries.  We  should  note  that  there 
is  no  canonical  choice  for  the  constant  c,  which  has  to  be  determined  experimentally. 
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In  practice,  C  is  deformed  using  the  Osher-Sethian  level  set  method  described  in  Section  2.1.2. 
Geometric  active  contours  have  the  advantage  that  they  allow  for  topological  changes  (splitting 
and  merging)  of  the  active  contour  C.  The  main  problem  with  this  model  is  that  the  desired  edges 
are  not  steady  states  for  the  flow  (13).  The  effect  of  the  factor  lT(a;)  is  merely  to  slow  the  evolving 
contour  Ct  down  as  it  approaches  an  edge,  but  it  is  not  the  case  that  the  Ct  will  eventually  converge 
to  anything  like  the  sought-for  edge  as  t  ^  oo.  Some  kind  of  artificial  intervention  is  required  to 
stop  the  evolution  when  Ct  is  close  to  an  edge. 


2.2.3  Conformal  (Geodesic)  Active  Contours 

In  [66,  19],  the  authors  propose  a  novel  technique  that  is  both  geometric  and  variational.  In  this 
approach  one  defines  a  Riemannian  metric  on  D  from  a  given  image  /  :  2)  — >  R,  by  conformally 
changing  the  standard  Euclidean  metric  to, 

=  W{x)^\\dx\\^.  (14) 

Here  W  is  defined  as  above  in  (11).  The  length  of  a  curve  in  this  metric  is 


=  J  W{C{s))  ds.  (15) 

(Here  ds  denotes  arc-length.)  Curves  which  minimize  this  length  will  prefer  to  be  in  regions  where 
W  is  small,  which  is  exactly  where  one  would  expect  to  And  the  edges.  So,  to  And  edges,  one 
should  minimize  the  IT-weighted  length  of  a  closed  curve  C,  rather  than  some  “energy”  of  C  (which 
depends  on  a  parametrization  of  the  curve) . 

To  minimize  {C),  one  computes  a  gradient  flow  in  the  sense.  Since  the  first  variation  of 
this  length  functional  is  given  by 


dC'^{C) 

dt 


V{WK-J\f  -S/W}  ds. 


where  V  is  the  normal  velocity  measured  in  the  Euclidean  metric,  and  Af  is  the  Euclidean  unit 
normal,  the  corresponding  gradient  flow  is 


Vconf  =  Wk  -  Af  -VW.  (16) 

Note  that  this  is  not  quite  the  curve  shortening  flow  in  the  sense  of  [48,  49]  on  given  the 
Riemannian  manifold  structure  defined  by  the  conformally  Euclidean  metric  g^ .  Indeed,  a  simple 
computation  shows  that  in  that  case  one  would  have 


V  =  W-^{WK-Af  -S/W).  (17) 

Thus  the  term  “geodesic  active  contour”  used  in  [19]  is  a  bit  of  a  misnomer,  and  so  we  prefer  the 
term  “conformal  active  contour”  as  in  [66].  However,  following  standard  practice  in  the  vision 
community,  we  will  use  both  terms  interchangeably  in  this  report. 

Contemplation  of  the  conformal  active  contours  leads  to  another  interpretation  of  the  concept 
“edge.”  Using  the  landscape  metaphor  one  can  describe  the  graph  of  lU  as  a  plateau  (where  ||V/|| 
is  small)  in  which  a  canyon  has  been  carved  (where  |1V/||  is  large).  The  edge  is  to  be  found  at  the 
bottom  of  the  canyon.  Now  if  IT  is  a  Morse  function,  then  one  expects  the  “bottom  of  the  canyon” 
to  consist  of  local  minima  of  W  alternated  by  saddle  points.  The  saddle  points  are  connected  to 
the  minima  by  their  unstable  manifolds  for  the  gradient  flow  of  W  (the  ODE  af  =  —WW{x).) 
Together  these  unstable  manifolds  form  one  or  more  closed  curves  which  one  may  regard  as  the 
edges  which  are  to  be  found. 
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Comparing  (16)  to  the  evolution  of  the  geometric  active  contour  (13)  we  see  that  we  have  the 
new  term  —Af  ■  VW,  the  normal  component  of  — VVC.  If  the  contour  Ct  were  to  evolve  only  by 
V  =  —N  ■  VW,  then  it  would  simply  be  deformed  by  the  gradient  flow  of  W.  If  lb  is  a  Morse 
function,  then  one  can  use  the  A-lemma  from  dynamical  systems  [89,  81]  to  show  that  for  a  generic 
choice  of  initial  contour  the  Ct  will  converge  to  the  union  of  unstable  manifolds  “at  the  bottom 
of  the  canyon,”  possibly  with  multiplicity  more  than  one.  The  curvature  term  in  (13)  counteracts 
this  possible  doubling  up  and  guarantees  that  Ct  will  converge  smoothly  to  some  curve  which  is  a 
smoothed  out  version  of  the  heteroclinic  chain. 

2.2.4  Conformal  Area  Minimizing  Flows 

Typically,  in  order  to  get  expanding  bubbles,  an  inflationary  term  is  added  in  the  model  (16)  as 
in  (13).  Many  times  segmentations  are  more  easily  performed  by  seeding  the  image  with  bubbles 
rather  than  contracting  snakes.  The  conformal  active  contours  will  not  allow  this  since  very  small 
curves  will  simply  shrink  to  points  under  the  flow  (16).  To  get  a  curve  evolution  which  will  force 
small  bubbles  to  expand  and  converge  toward  the  edges,  it  is  convenient  to  subtract  a  weighted 
area  term  from  the  length  functional  namely 

A^{C)  =  [  W{x)dx 
JRc 

where  da;  is  2D  Lebesgue  measure,  and  Rc  is  the  region  enclosed  by  the  contour  C. 

The  first  variation  of  this  weighted  area  is  [106,  92]): 

±A'^{Ct)  =  - I^W{C{s))Vds  (18) 

where,  as  before,  V  is  the  normal  velocity  of  Ct  measured  with  the  inward  normal. 

The  functional  which  one  now  tries  to  minimize  is 

S^{C)  =  C^{C)  +  cA^{C),  (19) 

where  c  G  R  is  a  constant  called  the  inflationary  parameter. 

To  obtain  steepest  descent  for  one  sets 

Kct  =  Konf  +  cW  =  {n  +  c)W{x)  -  M  -  VW.  (20) 

For  c  =  1  this  is  a  conformal  length/area  minimizing  flow  (see  [106]).  As  in  the  model  of  [18] 
the  inflationary  parameter  c  may  be  chosen  as  positive  (snake  or  inward  moving  flow)  or  negative 
(bubble  or  outward  moving  flow). 

In  practice,  for  expanding  flows  (negative  c,  weighted  area  maximizing  flow),  one  expands  the 
bubble  just  using  the  inflationary  part 

V  =  cW 

until  the  active  contour  is  sufficiently  large,  and  then  “turns  on”  the  conformal  part  Vjjonf  which 
brings  the  contour  to  its  final  position.  Again  as  in  [18],  the  curvature  part  of  Fact  also  acts 
to  regularize  the  flow.  Finally,  there  is  a  detailed  mathematical  analysis  of  (20)  in  [66]  as  well 
as  extensions  to  three  dimensional  space  in  which  case  the  curvature  k  is  replaced  by  the  mean 
curvature  H  in  equation  (20). 


2.3  Particle  Filters 

We  have  proposed  the  use  of  particle  filters  in  conjunction  with  geometric  active  contours. 


Let  Xt  G  R"  be  a  state  vector  evolving  according  to  the  following  equation: 

Xt+l  =  ft{xt,ut) 

where  Ut  is  i.i.d.  random  noise  with  known  pdf.  At  discrete  times,  observations  Yt  G  become 
available.  These  measurements  are  related  to  the  state  vector  via  the  observation  equation: 


Yt  =  ht{xt,vt) 


where  Vt  is  measurement  noise  which  is  known  a  priori.  It  is  assumed  that  the  initial  state 
distribution  denoted  by  7ro(dx),  the  state  transition  kernel  by  Kt(xt,  dxt+i)  and  the  observation 
likelihood  given  the  state,  by  gt(Ytjxt)  are  known.  The  particle  filter  (PF)  [47,  36]  is  a  sequential 

Monte  Carlo  method  which  produces  at  each  time  t,  a  cloud  of  n  particles,  whose 

empirical  measure  closely  “follows”  7rt((ixt|Fo:t))  the  posterior  distribution  of  the  state  given  past 
observations  (denoted  by  T:t\t{dx)). 

The  PF  was  first  introduced  in  [47]  as  the  Bayesian  Bootstrap  filter  and  its  first  application 
to  tracking  in  computer  vision  was  the  CONDENSATION  algorithm  [59].  The  particle  filter 
[36]  recursively  approximates  the  posterior  distribution  of  the  state  at  any  time  t  given  the  past 
observations,  by  Monte  Carlo  sampling.  It  works  for  any  linear  or  non-linear,  Gaussian  or  non- 
Gaussian  dynamical  system  for  which  ttq,  Kt(xt,  dxt+i)  is  known  and  can  be  sampled  from  and 
gt(ytlxt)  is  known. 

The  PF  starts  with  sampling  n  times  from  the  initial  state  distribution  7ro(fix)  to  approxi¬ 
mate  it  by  TTQ{dx)  =  and  then  implements  the  Bayes’  recursion  at  each  time 

step.  Now,  the  distribution  of  xt-i  given  observations  up  to  time  t  —  I  can  be  approximated 
by  ~  n  S”=i  '^3,(0  (dx).  The  prediction  step  samples  the  new  state  from  the 

distribution  Kt-i(xt^2i,  ■)■  The  empirical  distribution  of  this  new  cloud  of  particles,  = 

h  approximation  to  the  conditional  probability  distribution  of  Xt  given  obser¬ 

vations  up  to  time  t  —  1  {prediction  distribution) . 

In  the  update  step,  each  particle  is  weighted  in  proportion  to  the  likelihood  of  the  observation 
at  t,  Yt,  i.e. 

H)  ^  9t{YM"^) 

7fJ|j(dx)  =  i  X]r=i  then  an  estimate  of  TTt\t  {filtering  distribution).  One  resamples  n 

times  with  replacement  from  n'^’^fidx)  to  obtain  the  empirical  estimate  ir'^’^fidx)  =  ^ 

Note  that  both  7fJ|j  and  TrJjj  approximate  7rq(  but  the  resampling  step  is  used  because  it  increases 
the  sampling  efficiency  by  eliminating  samples  with  very  low  weights. 

2.4  Optimal  Transport 

The  mass  transport  problem  was  first  formulated  by  Caspar  Monge  in  1781,  and  concerned  finding 
the  optimal  way,  in  the  sense  of  minimal  transportation  cost,  of  moving  a  pile  of  soil  from  one 
site  to  another.  This  problem  was  given  a  modern  formulation  in  the  work  of  Kantorovich  [64], 
and  so  is  now  known  as  the  Monge-Kantorovich  problem.  The  problem  of  optimal  tranpsort  has 
appeared  in  econometrics,  fluid  dynamics,  automatic  control,  transportation,  statistical  physics, 
shape  optimization,  expert  systems,  and  meteorology  [94]. 

Very  importantly  optimal  transport  naturally  fits  into  certain  problems  in  computer  vision 
[41].  In  particular,  for  the  general  visual  tracking  problem,  a  robust  and  reliable  object  and  shape 
recognition  system  is  of  major  importance.  A  key  way  to  carry  this  out  is  via  template  matching, 
which  is  the  matching  of  some  object  to  another  within  a  given  catalogue  of  objects.  Typically, 
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the  match  will  not  be  exact  and  hence  some  criterion  is  necessary  to  measure  the  “goodness  of 
fit.”  For  a  description  of  various  matching  procedures,  see  [53]  and  the  references  therein.  The 
matching  criterion  can  also  be  considered  a  shape  metric  for  measuring  the  similarity  between  two 
objects.  In  fact,  in  his  doctoral  thesis  [41],  Fry  uses  optimal  transport  ideas  to  construct  precisely 
such  a  shape  metric  for  planar  shapes  whose  boundaries  can  be  described  as  closed  contours. 

2.4.1  Registration  and  Optimal  Transport 

We  have  shown  how  ideas  from  optimal  transport  may  be  used  to  formulate  a  new  approach  to 
image  interpolation  and  in  particular,  new  approaches  to  the  computation  of  optical  flow,  image 
morphing,  and  registration. 

Image  registration  is  a  process  of  aligning  images  so  that  corresponding  features  can  be  easily 
related.  The  images  may  come  from  different  modalities  or  from  the  same  modality  at  different 
times  or  both.  The  collection  of  papers  in  [111]  gives  an  excellent  overview  of  the  field. 

Registration  proceeds  in  several  steps.  First,  each  image  or  data  set  to  be  matched  should  be 
individually  calibrated,  corrected  for  imaging  distortions  and  artifacts,  and  cleared  of  noise.  Next, 
a  measure  of  similarity  between  the  data  sets  must  be  established,  so  that  one  can  quantify  how 
close  an  image  is  from  another  after  transformations  are  applied.  Such  a  measure  may  include 
the  similarity  between  pixel  intensity  values,  as  well  as  the  proximity  of  predefined  image  features 
such  as  implanted  fiducials,  anatomical  landmarks,  surface  contours,  and  ridge  lines.  Next,  the 
transformation  that  maximizes  the  similarity  between  the  transformed  images  is  found.  Often  this 
transformation  is  given  as  the  solution  of  an  optimization  problem  where  the  transformations  to 
be  considered  are  constrained  to  be  of  a  predetermined  class  U.  In  the  case  of  rigid  registration,  U 
is  the  set  of  Euclidean  transformations.  Many  deformations  are  not  of  the  class  (e.g.,  the  swelling 
of  tissues  in  the  body  or  the  elastic  part  of  the  motion  of  a  jelly  fish).  Therefore  a  more  realistic 
and  more  challenging  problem  is  elastic  registration  where  lA  is  the  set  of  smooth  diffeomorphisms. 
For  example,  in  the  medical  imaging  context,  due  to  anatomical  variability,  elastic  deformation 
maps  are  also  useful  when  comparing  images  from  different  patients.  Finally,  once  an  optimal 
transformation  is  obtained,  it  is  used  to  fuse  the  image  data  sets. 

2.4.2  Variational  Approach  to  Monge-Kantorovich  Problem 

There  have  been  a  number  of  algorithms  considered  for  computing  an  optimal  transport  map. 
For  example,  methods  have  been  proposed  based  on  linear  programming  [94],  and  on  Lagrangian 
mechanics  closely  related  to  ideas  from  the  study  of  fluid  dynamics  [13].  An  interesting  geometric 
method  has  been  formulated  by  Cullen  and  Purser  [29] .  In  our  case  for  image  tracking,  an  effective, 
robust  method  may  be  based  on  gradient  descent  and  the  concept  of  “polar  factorization”;  see 
[15,  42,  76]  for  details  about  polar  factorizations. 

In  order  to  motivate  our  approach  for  computing  an  optimal  transport  map  developed  in 
our  Army  Research  Office  sponsored  program,  we  will  consider  the  specific  problem  of  elastic 
registration  as  in  [52,  9]  in  which  the  similarity  between  two  images  is  measured  by  their 
Kantorovich- Wasserstein  distance.  Finding  “the  best  map”  from  one  image  to  another  then  leads 
to  an  optimal  transport  problem. 

In  the  approach  of  [52]  one  thinks  of  a  gray  scale  image  as  a  Borel  measure^  /x  on  some  open 
domain  S  C  where,  for  any  E  C  T),  the  “amount  of  white”  in  the  image  contained  in  E  is 
given  by  fJ,{E).  Given  two  images  (Sq,  ^o)  and  (S)i,  /Xi)  one  considers  all  maps  u  :  Sq  — >  Si  which 
preserve  the  measures,  i.e.  which  satisfy^ 


(21) 

^This  can  be  physically  motivated.  For  example,  in  some  forms  of  MRI  the  image  intensity  is  the  actual  proton 
density. 

X  and  y  are  sets  with  cr-algebras  A4  and  J\f,  and  if  /  :  X  — >  Y  is  a  measurable  map,  then  we  write  for 
the  pushforward  of  any  measure  p-  on  {X,Ai),  i.e.,  for  any  measurable  E  CY  we  define  =  fi[f~^{E)). 
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and  one  is  required  to  find  the  map  (if  it  exists)  which  minimizes  the  Monge-Kantorovich  transport 
functional  (see  the  exact  definition  below). 

More  precisely,  assuming  that  the  cost  of  moving  a  mass  m  from  x  G  to  y  G  \s  m-^{x,y), 
where  :  R'^  x  R*^  ^  R+  is  called  the  cost  function,  Kantorovich  [64]  defined  the  cost  of  moving 
the  measure  yo  to  the  measure  /ii  by  the  map  u  :  Do  — >  Di  to  be 

M{u)  =  I  ^{x,u{x))  dfio{x).  (22) 

J-So 


The  infimum  of  M{u)  taken  over  all  measure  preserving  u  :  (Do, Mo)  ^  (®i,Mi)  is  called  the 
Kantorovich-Wasserstein  distance  between  the  measures  ^o  and  /ii.  The  minimization  of  M{u) 
constitutes  the  mathematical  formulation  of  the  Monge-Kantorovich  optimal  transport  problem. 

If  the  measures  yi  and  Lebesgue  measure  d£  are  absolutely  continuous  with  respect  to  each 
other,  so  that  we  can  write  dfii  =  mi{xi)dL  for  certain  strictly  positive  densities  mi  G  T^(Di,d£), 
and  if  the  map  u  is  a  diffeomorphism  from  Do  to  Di,  then  the  measure  preservation  property  (21) 
is  equivalent  with  mass  preservation: 

mo{x)  =  det[Du{x))  ■  mi{u{x)),  (for  almost  all  xG  Do).  (23) 

The  Jacobian  equation  (23)  implies  that  if  a  small  region  in  Do  is  mapped  to  a  larger  region  in 
Di,  there  must  be  a  corresponding  decrease  in  density  in  order  to  comply  with  mass  preservation. 
In  other  words,  expanding  an  image  darkens  it. 

The  Monge-Kantorovich  problem  corresponds  to  the  cost  function  4)(a;,  y)  —  ^Ha;—  r/|p.  A 
fundamental  theoretical  result  for  the  case  [15,  43,  70]  is  that  there  is  a  unique  optimal  mass 
preserving  u,  and  that  this  u  is  characterized  as  the  gradient  of  a  convex  function  p,  i.e.,  u  =  Vp. 
General  results  about  existence  and  uniqueness  may  be  found  in  [4]  and  the  references  therein. 

To  reduce  the  Monge-Kantorovich  cost  M{u)  of  a  map  :  Do  — >  Di,  in  [52]  we  consider  a 
rearrangement  of  the  points  in  the  domain  of  the  map  in  the  following  sense:  the  initial  map  is 
replaced  by  a  family  of  maps  u*  for  which  one  has  m*  o  s*  =  for  some  family  of  diffeomorphisms 
s*  :  Do  — >  Dq.  If  the  initial  map  sends  the  measure  po  to  pi,  and  if  the  diffeomorphisms  s* 
preserve  the  measure  po,  then  the  maps  u*  =  o  {s*)~^  will  also  send  po  to  yi. 

Any  sufficiently  smooth  family  of  diffeomorphisms  s*  :  Dq  — >  Do  is  determined  by  its  velocity 
field,  defined  by  dts^  =  v*  o  s*. 

If  M*  is  any  family  of  maps,  then,  assuming  rt^po  =  Mi  for  all  t,  one  has 

^M{u*)=[  {<^,c{x,u\x)),v\x))  dyo{x).  (24) 

Jvo 

The  steepest  descent  is  achieved  by  a  family  s‘  G  Diff^^(Do),  whose  velocity  is  given  by 

V*  =  {<^,,{x,u*{x)))  .  (25) 

Here  V  is  the  Helmholtz  projection,  which  extracts  the  divergence-free  part  of  vector  fields  on  Do, 
i.e.,  for  any  vector  field  w  on  D  one  has  w  =  V[w]  +  Vp. 

From  =  u*  o  s*  one  gets  the  transport  equation 

^  +  v*.  Vm*  =  0  (26) 

where  the  velocity  field  is  given  by  (25).  In  [9],  it  is  shown  that  the  initial  value  problem  (25), 
(26)  has  short  time  existence  for  (7^’“  initial  data  and  a  theory  of  global  weak  solutions  in  the 
style  of  Kantorovich  is  developed. 
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For  image  registration,  it  is  natural  to  take  y)  =  ^\\x—  y|p  and  So  =  Si  to  be  a  rectangle 
[52] .  Extensive  numerical  computations  show  that  the  solution  to  the  unregularized  flow  converges 
to  a  limiting  map  for  a  large  choice  of  measures  and  initial  maps.  Indeed,  in  this  case,  we  can 
write  the  minimizing  flow  in  the  following  “nonlocal”  form: 

^  = - {u*  +  V(-A)-1  div(M*)}  •  Vu‘.  (27) 

Ot  UIq 

The  technique  has  been  mathematically  justified  in  [9]  to  which  we  refer  the  reader  for  all  of 
the  relevant  details. 

2.4.3  Other  Applications  of  Optimal  Transport 

In  our  research,  we  focused  on  the  uses  of  ideas  from  optimal  transport  for  problems  in  controlled 
active  vision  and  visual  tracking.  However,  given  the  potential  power  of  these  ideas  in  systems 
and  control,  we  would  like  to  list  some  other  applications  of  Monge-Kantorovich: 

1.  Lyapunov  theory  is  essential  is  studying  nonlinear  system  stability  and  controller  synthesis. 
In  some  very  interesting  work,  Rantzer  [95]  has  formulated  a  dual  to  Lyapunov’s  second 
theorem.  The  idea  is  that  the  Lyapunov  function  is  regarded  as  the  “cost  to  go”  in  an 
optimal  transport  problem  and  is  dual  (in  the  sense  of  linear  programming)  to  the  density 
function  typically  studied  in  Monge-Kantorovich  theory.  These  ideas  give  a  powerful  new 
tool  in  studying  nonlinear  system  analysis. 

2.  Shape  optimization  is  another  area  of  use  for  optimal  transport  [39].  For  example,  given 
two  densities  and  an  insulating  medium  into  which  we  place  a  fixed  amount  of  conducting 
material  one  can  consider  the  problem  of  the  optimal  placement  of  the  conducting  material 
to  minimize  the  heating  induced  by  the  flow.  This  can  be  put  into  the  Monge-Kantorovich 
optimal  transport  framework.  Similar  remarks  apply  to  problems  in  compression  molding, 
where  one  considers  an  incompressible  plastic  material  being  pressed  being  two  plates  in 
which  one  wants  to  track  the  air-plastic  interface. 

3.  One  of  the  most  beautiful  applications  of  optimal  transport  is  in  meteorology,  in  particular, 
semigeostrophic  models.  These  are  concerned  with  with  large  scale  stratified  flows  and  front 
formation  [29] .  The  idea  is  that  meteorologists  want  to  model  how  fronts  arise  in  large-scale 
weather  patterns.  Tracking  such  fronts  is  a  key  goal,  and  semigeostrophic  equations  seem  to 
give  a  reasonable  mathematical  model  for  the  creation  of  such  fronts.  This  leads  naturally 
to  optimal  mass  transport  equations. 
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